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Abstract 

Non-technical summary. A problem commonly encountered in time-series econo¬ 
metrics is that of having to relate series with different degrees of persistence, which 
arises in a particularly acute form in the empirical finance literature on the predictab¬ 
ility of excess stock returns. This motivates the consideration of nonlinear predictive 
regression models, which unlike linear models are capable of relating series with dif¬ 
ferent dependence properties. 

However, a major obstacle to the estimation of these nonlinear models in prac¬ 
tice has been the absence of the theoretical results needed to justify valid inference 
in these models. This paper fills this gap in the literature by showing that standard 
nonparametric tests (based on kernel regression) provide a means of conducting in¬ 
ference in nonlinear regression models that is completely robust to the degree of the 
persistence in the regressor. This is surprising in view of the known difficulties with 
the parametric estimation of the linear model in this same context. 

Our results thus provide a sound theoretical basis for predictability tests that 
are robust both to the extent of the possible nonlinearity in the model, and to the 
dependence properties of the regressor. 

Technical summary. A significant problem in predictive regression concerns the 
invalidity of standard OLS-based inferences when the regressor is highly persistent. 
Recent work on nonparametric methods has suggested that inference based on these 
may remain valid in this setting. However, existing results are insufficient to sup¬ 
port the conclusion that standard nonparametric testing procedures have the correct 
asymptotic size, in the sense of controlling asymptotic null rejection probabilities uni¬ 
formly in the parameters describing the persistence of the regressor. We provide 
a proof of precisely such a result, thereby establishing the posited validity of these 
methods. In the course of doing so, we develop novel technical results concerning 
additive functionals of autoregressive processes exhibiting moderate deviations from 
a unit root. This leads us to a unified theory for the behaviour of kernel density 
estimators within a class of processes that includes both stationary and integrated 
processes, and arrays formed from such processes. 
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1 Introduction 

Valid inference in a predictive regression - as distinct from a cross-sectional regression - 
faces two distinctive challenges. The first arises when the regressor is strongly serially de¬ 
pendent. As is now well known, in this case the limiting distribution of OLS is non-pivotal, 
being not only non-Gaussian but also depending on the unknown degree of persistence of 
the regressor. This renders conventional inferential procedures - such as referring OLS t- 
statistics to quantiles of the standard Gaussian distribution - invalid. The second difficulty 
concerns the possibility of a relationship between series with strikingly different depend¬ 
ence properties. For example, when testing for stock return predictability in finance, it is 
common to confront a series exhibiting martingale-difference-like behaviour, such as excess 
returns, with a candidate predictor that appears to be integrated (or nearly so), such as 
the dividend-price ratio. But parametric linear models, though widely used to test for such 
predictability, imply that both the regressor and the dependent variable should manifest a 
similar degree of persistence - unless, of course, they are entirely unrelated. 

The first of these problems has been the subject of a substantial literature, which 
has sought to either: develop procedures capable of handling the non-standard limiting 
distribution of OLS; or to propose novel estimators that remain asymptotically Gaussian, 
regardless of the persistence of the regressor.^ However, since this work has all been carried 
out in a parametric linear regression setting, it goes no way toward addressing the second 
of the two problems noted above. The successful resolution of that problem seems to 
lie with nonlinear regression models, since the application of nonlinear transformations to 
dependent processes has been shown to produce new series with radically different memory 
properties (Marmer, 2008). The absence of any theoretical priors as to the functional form 
of these possible nonlinearities leads us naturally to consider nonparametric methods. 

Some significant steps in this direction were taken in a recent paper by Kasparis, An- 
dreou, and Phillips (2015, hereafter KAP), who studied the behaviour of kernel regression 
estimators - and associated f-statistic-based tests of non-predictability - within a certain 
class of strongly dependent regressor processes. Building on earlier work on local time 
density estimation by Wang and Phillips (2009a,b), the authors showed that, despite the 
assumed strong dependence of the regressor, nonparametric t-statistics enjoy standard 
Gaussian limits, exactly as they do when the regressors are weakly dependent. Their res¬ 
ult holds out the prospect that nonparametric methods may be capable of simultaneously 
resolving both of the problems identified above: not only do they allow us to estimate 
models capable of relating series with differing degrees of persistence, but they also yield 
estimates whose limiting distributions (upon studentisation) are apparently unaffected by 
the persistence of the regressor. 

One would like to be able to conclude that standard nonparametric tests retain their 
validity, in a predictive regression, regardless of the extent of the serial correlation affect- 

^See, amongst others, Cavanagh, Elliott, and Stock (1995), Campbell and Yogo (2006), Jansson and 
Moreira (2006), Magdalinos and Phillips (2009), Phillips and Lee (2013), and Elliott, Muller, and Watson 
(2015). 
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ing the regressor. Formally, what needs to be shown is that the asymptotic null rejection 
probabilities of these tests can be controlled uniformly in the parameters describing the 
persistence of the regressor - which in this paper will be summarised by an autoregress¬ 
ive coefficient p. But while KAP’s results - together with existing results for stationary 
(weakly dependent) regressors - are highly suggestive that such control is possible, they 
are insufficient to sustain any such claim. What we crucially require - and what is missing 
from the existing literature - are results concerning the asymptotics of kernel regression 
estimators when regressors are stationary but exhibit ‘moderate deviations from a unit 
root’ (Phillips and Magdalinos, 2007); we shall term such processes mildly integrated. 

The uniformity sought in the present paper requires that we consider triangular arrays 
of regressor processes, which will be parametrised in terms of p = pn- Stationary processes 
are identified as those for which /?„—>■ p < 1, whereas local-to-unity processes (the class 
considered by KAP) have pn = f + 0{n~^). Mildly integrated processes he on exactly the 
bridge between these two classes, with —)■ 1 but n(l — pn) —>■ oo. They therefore inherit 
some of the properties of both stationary and local-to-unity processes, but are distinct from 
both, and their treatment requires the development of some genuinely novel limit theory. 

The first contribution of the present paper is to show that nonparametric t-statistics 
remain asymptotically Gaussian when regressors are mildly integrated. This result - in 
conjunction with previous work - is sufficient to permit the conclusion that t-statistic- 
based tests and confidence intervals have the correct asymptotic size, in the sense that the 
relevant null rejection (or coverage) probabilities are controlled uniformly in the degree of 
persistence of the regressor (as described by p; see Section 2). In view of this, nonparamet¬ 
ric inference may be conducted in a predictive regression entirely without regard for the 
possible temporal dependence of the regressor, without thereby endangering the validity 
of that inference. 

Underpinning this finding are some new results concerning the asymptotics of kernel 
density estimators under mild integration (see Section 3). The proofs of these rely on a 
combination of arguments appropriate to the stationary and local-to-unity cases. Because 
the dependence of mildly integrated processes is sufficiently weak, kernel density estimators 
converge not to the local time of some limiting process, but to a (non-random) standard 
Gaussian probability density. In this respect, mildly integrated processes are more akin 
to stationary processes, except for the noted Gaussianity of the limiting density. On the 
other hand, they also share the diminished recurrence and slower rates of convergence 
characteristic of local-to-unity processes. In combination with previous work, the results 
of this paper yield a unified theory for the behaviour of kernel density estimators under all 
possible values - and drifting sequences - of the autoregressive parameter p. We refer to 
the possible limits of these estimators as spatial densities, since - depending on {pn} ^ these 
may alternately be (non-random) probability densities or (random) local time densities, 
but not both simultaneously. The technical results of this paper will undoubtedly prove 
useful for the analysis of other inferential problems, beyond the predictive regression setting 
considered in this paper. 
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Proofs of the main results appear in Appendices A-D. Proofs of results that are either 
straightforward, or closely related to those that have already appeared in the literature, 
are given in the Supplement, which also provides an index of notation. 

Notation. All limits are taken as n —>■ oo unless otherwise stated. For sequences {on}, {bn}'- 
On bn denotes hm„_^(X) o-nlbn = c G M\{0}, and a„ ~ bn denotes hm„_).oo anjbn = 1- For 
positive sequences: < bn denotes limsup„_^oo a^/fen < oo - equivalently, = 0{bn)- 

a b denotes a < Cb. For random sequences {x„}, {r/n}- Xn Un denotes Xn = Op{yn)- 
Lip denotes the class of (real-valued) Lipschitz continuous functions on M, BL the class 
of bounded and Lipschitz functions on M, and BIL the subclass of BL functions that are 
Lebesgue integrable. IN denotes the class of Lebesgue p-integrable functions on M. 
denotes weak convergence in the sense of van der Vaart and Wellner (1996), and ^fdd the 
convergence of finite-dimensional distributions. 

2 Uniform inference in nonparametric predictive regression 

2.1 Data generating process (DGP) 

The inferential problem that motivates our work concerns the nonlinear predictive regres¬ 
sion model studied by KAP. The data generating process is described by 


yt = m{xt-i) + ut 


( 2 . 1 ) 


where m and are as per 

Assumption dgp. 

DGPi m G {mo : M —> M | sup3.gKl^o(3^)l < for some M < oo. 

DGP2 is a scalar i.i.d. sequence; eo has charaeteristic function •= 

satisfying G and a Lebesgue density f^ G Lip that is everywhere nonzero; 
Eeo = 0 and Egq = 1. 

DGP3 and are generated according to 

OO 

xt := pxt-i + vt vt := ^ (fk^t-k, (2.2) 

k=o 

with xo = 0; p G R := [—1 + d, 1] for some 5 > 0; cpo ^ 0; YlT=o\f'k\ < oo/ ond 

■= Ylk=o f’k 7^ 0. 

DGP4 o martingale difference sequence with respect to Qt '.= (j({xs, Us}s<t); with 

E[uj I Gt-i] = (Tn a.s. and supiE[uj | Gt-i] < C < oo a.s. 

Remark 2.1. The assumption that G Lip is used only in the stationary region, i.e. where 
p < 1. While this requirement could likely be dispensed with, we have retained it here 
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SO as to facilitate the direct application of results from Wu, Huang, and Huang (2010). 
fs G Lip is implied, for example, if G L^. The strict positivity of /g is also assumed 

merely for convenience, so as to ensure that the stationary solution to ( 2 . 2 ) has a density 
that is strictly positive at every x G M, thereby avoiding the possibility that we might 
(inadvertently) attempt to estimate m{x) at points of zero density. 

Remark 2.2. DGP3 is cognate with Assumptions 2.3 and 2.4 in KAP, with the key difference 
that we do not restrict {xt} to the local-to-unity region, in which p = 1 + ^ for some fixed 
c G M, but instead allow the autoregressive parameter p to range over the entirety of 
R = [— 1 + (5,1]. Our main results easily extend to sequences of parameter spaces of the 
form R = R„ := [—1 + (5,1 + ^], for some fixed c < oo. Owing to the initialisation xq = 0, 
the regressor process is nonstationary, regardless of the value of p. However, when p < 1, 
(2.1) admits a stationary solution, which corresponds to the weak limit of Xn n ^ oo. 

'^^o\4‘k\ < oo implies that {vt} is a short-memory process, and so excludes the long- 
memory and anti-persistent cases that are also considered in KAP. It is likely that our 
results could also be extended to cover these, but we have refrained from considering these 
in order to keep the paper to a manageable length. 

Remark 2.3. DGP 4 implies that the regressor Xt-i is exogenous, and ensures that m is 
always identified from m{x) = ^[yt \ Xt-i = x], regardless of the value of p. If the 
model ( 2 . 1 ) were reformulated with xt in place of Xt_i, then estimation of m would remain 
possible when p = 1 (and, indeed, if p = p^ —>• 1 ), despite the potential endogeneity of the 
regressor (see Wang and Phillips, 2009b). On the other hand, if p < 1, m would no longer 
be identified, and any putative estimate of m would be biased (even asymptotically). 

The DGP is thus completely described by the unknown parameters (m,p, 7 ), where 
7 := (i/jg, (T^, {Put}) and F^t denotes the conditional distribution ut \ let P 

denote the set of possible values for 7 . Here the regression function m G ^ is the object 
of interest, whereas (p, 7 ) G R x P are merely nuisance parameters. Let x G M be given. 
For a hypothesis such as Hq : m{x) = 0, the subset of the parameter space consistent with 
Ho is given by 

H{d) '■= {m G ^ I m(x) = 0} x R x P, 

whence the size of a test of Hq depends on its maximum rejection probability over all points 
in H{6). In keeping with the literature on the parametrie predictive regression problem, in 
which p G R is a particularly troublesome nuisance parameter - owing to the discontinuity 
in the (pointwise) limiting distribution of the OLS estimator at p = 1 - we shall only seek 
to control the (asymptotic) rejection probability of tests of Hq on the smaller set 

H*{9) := {m G ^ \ m{x) = 0} x R x { 7 }. 

(In other words, our asymptotics shall hold 7 fixed as n ^ 00 .) Our focus on H*{6), 
rather than H{6), may be justified by the complications posed, even in the present setting, 
by controlling the (asymptotic) rejection probability of a test of Hq, uniformly in the 
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persistence parameter p G R. The proof that standard nonparametric testing procedures 
indeed achieve such size control requires the development of some genuinely new asymptotic 


theory (see Section 3 below). On the other hand, the passage from H*{0) to H{6) would 


merely call for relatively straightforward array extensions of existing results, along with 
those given in this paper. 

2.2 Nonparametric estimation and inference 

For our purposes, a suitable estimator of m in (2.1), at each x G M, is provided by the 
local level (Nadaraya-Watson) kernel regression estimator. 



(2.3) 


where K : M —>■ M is a smooth probability density, h > 0 denotes the bandwidth, and 


Kh{x) ■= h ^K{h ^u). We shall suppose h = hn, for {hn} a shrinking bandwidth sequence 
as per 

Assumption SM (smoothing). 

SMI K G BIL is positive and compactly supported, with = 1 . 

SM2 hn > 0 for all n, hn = o(l) and h~^ = o(n^/^). 


Remark 2.4. The persistence of Xt, as summarised by p, is intimately connected with 
the recurrence properties of Xt, by which we mean the rate at which the local signal 
Sn{x', h) := Kh{xt — x) diverges, for each fixed x G M. As is well known, when h 

is fixed, in the stationary region [pn p < 1) Sn diverges at rate n (probabilistically); 
whereas when in the local-to-unity region {pn = 1 + 0(n“^)), this rate is reduced to 
Mildly integrated processes are strictly intermediate between these cases, corresponding to 
a growth rate of n(l — for Sn- Insofar as p is unknown, the maximum rate at which 

h = hn may shrink to zero, while still permitting the divergence of Sn - and hence, the 
consistency of rhn - will thus be determined by the region in which that divergence rate is 
slowest, i.e. the local-to-unity region. This accounts for the requirement that h~^ = 
in SM 2 . This could be relaxed if hn were chosen in such a way as to adapt to the (unknown) 
recurrence of {xf}, but a consideration of such procedures is beyond the scope of this paper. 

For a chosen spatial point x G M, a test of Hq : m{x) = 6 may be based on the 
nonparametric t-statistic 


in{x;6,h) = Sn{x;h) ^ [m„(x; h) - 0] 


(2.4) 


where 
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Test inversion leads to the familiar equal-tailed confidence interval for m{x), 

Cn(x; h) := {0 G M I lin(x; 0, /i)| < 2 i_q,/ 2 } (2.6) 

= h) - ^i_Q,/ 2 Sn(x; h), m„(x; h) + Zi_a/2Sn{x] h)], 

where Zr denotes the rth quantile of the standard normal distribution. Cn{x; h) is a 
‘pointwise’ conhdence interval, in the sense that it concerns the value of m at a particular, 
hxed X G M, rather than over a collection of such points. 

The (finite-sample) coverage probability of Cn{x',hn) is given by 

CP„(x; m, p) := Pm,p{?R(a:) G C„(x; /i„)}, (2.7) 

where Pm,p is indexed by the values of m and p generating the data. Of course, Pm,p also 
depends on 7 : but in keeping the preceding discussion of the hypothesis testing problem, 
we shall regard 7 as hxed, and so suppress it from the notation here. We would like to 
control the asymptotic size of Cn(x; h), where this is dehned as 


AsySz(x) := liminf inf CPn(x;m,/ 3 ); 

n^oo (m,p)€.^xR 


a related quantity is the asymptotic maximum coverage probability^ 


AsyMaxCP(x) := limsup sup CPn(x;m,p). 

n^oo (m,p)E^xR 


( 2 . 8 ) 


(2.9) 


It is known from existing results that tn{x] hn) ^ A^[0,1] for every fixed {m, p) G ^ x R. 
However, while these results may be suggestive of as having the correct asymptotic size, 
they are insufhcient to support this conclusion, which instead requires that this convergence 
hold along all drifting sequences {(m„,/?„)} C ^ x R. 

Establishing the required uniformity with respect of m G ^ poses no particular diffi¬ 
culty: due to the linearity of the local level estimator, m affects only the bias of nin, and 
the uniform negligibility of this term follows from standard arguments. On the other hand, 
handling the nuisance parameter p requires more care. Essentially, the problem is one of 
proving 


Vn{x) 


hl!^Yrt=iKhSxt-x)ut+i ^ ^ 
[J2t=l Khr, [xt -x)f K^] 


( 2 . 10 ) 


along a sufficiently large class of drifting sequences {pn} C R. By adapting an argument 
from Andrews and Cheng (2012), it may be shown that, for the purposes of computing 
AsySz(x) and AsyMaxCP(x), it is sufficient to prove that (2.10) holds for the following 
classes of convergent sequences in R (see the proof of Proposition 2.1(ii) below): 


• stationary (with parameter p): {pn} G if —>■ p G [—1 -|- h, 1); 

• mildly integrated: {pn} G 'R-yn if Pn —>■ 1 hut n(p„ — 1) —>■ — 00 ; and 
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• local to unity (with parameter c): {pn} S ^lu if 1 and n{pn — 1) —>■ c G M. 


We may further restrict {pn} G ^ST ^ f) f®'' {Pn} ^ ^mi 

so that Pn G (0,1) for all n. Let T^grp := ^ST’ ^LU •” UceK^LU ^ •” 


T^grp U U T^lu- 

In all cases, the numerator of (2.10) is a martingale, and so is amenable to the applic¬ 
ation of existing martingale central limit theory. The principal difficulty is thus to show 
that the conditional variance “®)) upon standardisation, converges weakly 

to an a.s. nonzero limit. Results of this kind are available in the literature for the case 
when {/?n} G l^gx U^LU’ when {p„} G Ttyn'- this necessitates the theoretical work 

undertaken in Section 3. 


2.3 Uniform validity of inferences 

Our main result on the uniform validity of tests and pointwise confidence sets is the fol¬ 
lowing. Let ^ denote a fixed, finite subset of M. For a function x i—>■ a(x), let [a{x)]x^x 
denote the vector ( 0 ( 3 : 1 ),..., a{xm))', for { 3 : 1 ,... Xm} an enumeration of . 

Proposition 2.1. Suppose dgp and SM hold, and that additionally hn = o(n“^/^). Then 

(i) for every finite C M, 


[inix]mn{x),hn)]xepr ^ N[0,I#sr] 

along every {nin} C ^ and {pn} G o,nd 
(ii) for each x G M, the confidence intervals Cn{x‘,hn) are asymptotically similar, i.e. 

AsySz(x) = AsyMaxCP(x) = 1 — a. 

Remark 2.5. Part (i) establishes that any finite collection of t statistics is jointly asymp¬ 
totically normal and independent across the spatial points x ^ 3T, along all the drifting 
sequences that are relevant for the size calculation in part (ii). The latter further implies 
that the t-test of Hq : m{x) = 9 - the inversion of which was used to construct Cn{x', hn) 
- is also asymptotically similar. 

Remark 2.6. hn = o(n“^/^) is required to undersmooth the bias. If DGPi were strengthened 
such that the second derivatives of m G ^ were assumed to be uniformly bounded, then 
it would be possible to relax this requirement to hn = o(n“^/®). Under the null of non- 
predictability considered below, m is a constant function: in this case the bias of thn 
vanishes, and the preceding condition on hn may be dropped entirely. 

KAP are particularly concerned with testing the null that Xt-i cannot predict yt, which 
may be formally expressed as 


Hq : m{x) = 9, Vx G M. 
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The authors base their tests of Hq on a vector of f-statistics, 0, hn)]xeSC- The resulting 
tests are perhaps more correctly regarded as tests of 

Hq : m{x) = 6, Mx ^ ^ 

rather than of Hq, insofar as they only have power against alternatives to Hq. 

Although 9 is unknown, it is consistently estimable at rate under Hq, uniformly 

over p € R, by := ^Y^t =2 Vt- Accordingly, [inix;9n,hn)]xeX- has the same limiting 
distribution as [tnix;6,hn)]x£X'- KAP consider the following test statistics, 

Fn,suTa ■— ^ tri^X,6xi,hn) Csum Pn,raax ■— XtlSX.t.,,^(x,d,^,hn) Cmax; (^Tf) 

x^X 

where, by part (i) of Proposition 2.1, the stated convergence holds along all {pn} £ H (recall 
m{x) = 9 for all a: £ M under Hq), for Csum having a distribution, and Cmax the 

same distribution as the maximum of independent x^[l] variates. For i £ {sum, max}, 
let Cr,i denote the r quantile of Cu so that an a-level test of Hq based on rejects if and 
only if Fn^i > c\-cx,i. Thus, by almost identical arguments as were used to prove part (ii) 
of Proposition 2.1, we have 

Proposition 2.2. Suppose dgp and SM hold. Then for i £ {sum, max} and every finite 
SF CM, a test of Hq based on Fn^i is asymptotically similar, i.e. 


limsupsupPo,p{Fn,i > ci-a,i} = liminf inf Po,p{Tn,i > ci-a,i} = a. (2.12) 

n—^oo pgR n—>oo pgR 

Remark 2.7. A more satisfying version of Proposition 2.1(i) would extend the result to 
any hnite collection of - data-dependent - points C (with fixed as n —)■ oo). 

Unfortunately, such a result is beyond the purview of the available martingale central limit 
theory, at least under DGP. 

Proofs of Propositions 2.1 and 2.2 appear in Appendix A. 


3 Spatial density estimation: a unified limit theory 

3.1 Preliminaries 

The preceding section is underpinned by some new results concerning the limiting beha¬ 
viour of the spatial density estimate e~^ ^h„{xt — x) when {xt} is mildly integrated - 
that is, along drifting sequences {pn} £ ^mi ^hat exhibit moderate deviations from a unit 
root; here e„ := &n{{Pn\) denotes a norming sequence. The proofs of these results in turn 
rely on the following extension of Theorem 2.1 in Wang and Phillips (2009a, hereafter WP). 
We first restate their assumptions, some of which will also be needed here. Let {xn,t}^=i be 
a triangular array, {Fn,t\'t=i ^ collection of cr-fields such that each Xn,t is J>i,t-measurable, 
/ : M —>■ M, and define finiv) ■= {(^ k) \ rjn < k < {1 — fi)n, k + pn < I < n} for some 
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r?e ( 0 , 1 ). 


Assumption wp. 


WPi f £ n L?. 


WP 2 There exists a stochastic process X{r) on [0,1] having continuous local time Cx{f,a) 
such that Xn^lnr\ ■^cxd([0, 1]). 


WP3 For all 0 < s < t < n, n > 1, there exist constants dn,s,t such that 


(a) for some mo > 0 and C > 0, inf(s > rf^^jC as n ^ oo, and 

i. lim^^o ^ Et={i-r,)n 

ii. lim^^o linin^oo ^ maxo<s<(i_^)„ Y.t=s+i = 0> 


1 

>-oo ^ 


^n—)-oo 


iii. limsup^^^ ^ maxo<^<n-i EIL^+i dn!s,t < oo; 

(b) conditional on Fn,s, {xn,t — Xn,s)/dn,s,t has a density hn^s,t{x) with is uniformly 
bounded by a constant K, and 

lim lim sup sup |/i„,s,i(u) -/in,s,t(0)| = 0. 

5-I.On^oo |n|<5 

Our extension of WP’s Theorem 2.1 consists of replacing WP 2 with the following 
Assumption wp (continued). 

WP 2 ' There exists a stochastic process f, : [0,1] x M —>■ M+, which is continuous a.s. with 
jl{r, x)dx < oo for all r G [0, 1], such that for every g G BL, 



(3.1) 


over (r, a) G [0,1] x M. 

Proposition 3.1. Suppose WPi, WP 2 ' and WP3 hold. Then for any Cn —>■ oo and Cnjn —>■ 0 



(3.2) 


over (r, a) G [0,1] x M. 

Remark 3.1. While WP 2 is certainly sufficient for WP 2 ' with fl = Cx, WP 2 is unnecessarily 
strong, being exclusive of certain processes for which (3.2) holds. Indeed, it is evident 
from Jeganathan (2004) that (3.2) may obtain even if the convergence in WP 2 holds only 
in the sense of the finite-dimensional convergence. The proof of his Lemma 8 implies that 
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WP2' holds whenever Xn^\nr\ '^fdd and {®n,[nrj} is asymptotically equicontinuous in 

probability, in the sense that for every e > 0, 

lim limsup sup P{|x„^LnriJ - ^n.LnraJ I > e} = 0. (3.3) 

<5^0 n->-oo \r^-r 2\<6 

However, as discussed in more detail in Remark 3.5 below, when {xn^t} is derived from a 
mildly integrated process, even such an apparently weak requirement as (3.3) fails to hold: 
though the finite-dimensional limit of Xn,[nr] exists, it is not separable. For these processes, 
WP2' must therefore be verihed by other means. 

Remark 3.2. (3.1) extends straightforwardly, via a suitable choice of approximating BIL 
functions, to ga{x) '■= l{x < a}, and thereby entails the convergence of the empirical 
distribution function of {xn,t} to its population counterpart, i.e. 

Fn{a) ^ < a} -^fdd / h(l, x) dx =: F(a), (3.4) 

n ^ J{cc<a} 

where F is itself a distribution function if /i(l, x) dx = 1, as is generally the case. Insofar 
as (3.4) holds, F may be identihed as the spatial distribution associated to the finite¬ 
dimensional limit X of Xn^[nr\- We shall accordingly refer to x i-a /i(l,x) as the spatial 
density associated to X. Some such unifying term is needed here, because depending on the 
process generating x„ /^(Ij^) may correspond to either the (non-random) probability 
density of ^(1), or the local time density of r i—>■ X{r), but not both. 

3.2 Finite-dimensional convergence 

In applying Proposition 3.1 to the setting of DGP, we shall work with the scale-normalised 
array defined by 

Xn,t ■= var(xn)“^/^xt =: d~^xt, (3.5) 

ensuring that the weak limit of Xn,n has unit variance in all cases. Proposition 3.1 is broad 
enough to cover the class of regressor processes contemplated in DGP, even in cases where 
p = Pn is allowed to depend on n. Indeed, it is the manner in which pn approaches unity 
(if at all) that determines the spatial density appearing in (3.1). In accordance with the 
division of the sequences {pn} £ given in Section 2.2 above, dehne 


p{r,a-{pn}) 


rvp{a) 

if {pn} G 

rip{a) 

if {Pn} G 


if {Pn} G T^lU 


(3.6) 


where Vp is the density corresponding to the stationary solution to (2.2), normalised to have 
unit variance; y) is the standard Gaussian density; and Cc{r, a) is the local time density (at 
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time r € [0,1] and point a € M) associated to the normalised Ornstein-Uhlenbeck process, 

( f‘l \ f-r 

J e2(i-^)"dsj j (3.7) 

for W a standard Brownian motion on [0,1]. 

For the purposes of the next result, {hn} denotes a deterministic, nonzero bandwidth 
sequence; recall fh{x) := h~^ f{h~^x). The proof appears in Appendix B. 

Theorem 3.1. Suppose dgp holds with p = pn for some {pn} G CLiT'd f G fl 
Then if hn = o{dn) and nd~^hn —>■ oo, 


Pn{r,a]f, hn) 


[nrj 

— fhn{xt - dna) 
^ t=l 




fdd {Pn}) 


f, 


(3.8) 


over (r, a) G [0,1] x M. 

Remark 3.3. In view of Remark 3.6 below, —>■ oo whenever {pn} G ^LU’ 

the arguments given in the proof of Theorem 3.1 also imply that, in this case. 


dn 

n 


[nrJ 

t=i 


x) p{r,0;{pn}) [ f 
IR 


jointly with (3.8), for each x G M. 

Remark 3.4. The stationary {{pn} G T^gx) local-to-unity {{pn} G T^lu) ‘^^ses are 
covered by the results of Wu and Mielniczuk (2002), Wang and Phillips (2009b) and Wu, 
Huang, and Huang (2010). The proof under mild integration ({pn} G is new to 

the literature, and the arguments employed are a combination of those appropriate to the 
stationary and local-to-unity cases. 

As in stationary case, one might envisage a ‘direct’ proof of (3.8), by proving the 
asymptotic negligibility of 

d ” 

Pn - Epn = — - E,fh^{xt)], 

^ t=i 

and then demonstrating the convergence of to the r.h.s. of (3.8) (here we have taken 
a = 0 and r = 1 for simplicity). However, the lesser recurrence of mildly integrated 
processes, as reflected in the reduced standardisation significantly complicates the 

problem. Most notably, straightforward calculations show that the bound given in (13) in 
Wu, Huang, and Huang (2010) would here imply only that 


\pn - Epnl <p {nhn) ^^"^dn + U 


(3.9) 


Since dn (1 —p^) ^ under mild integration (see Remark 3.6 below), requiring negligibility 
of the r.h.s. would thus exclude those {pn} G R-m for which 1 — Pn ^ 
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The failure of the bound in (3.9) to be useful over the whole of the mildly integrated 
region necessitates the different proof strategy employed here, which is to use a kind of law 
of large numbers to establish (3.1) for the standardised series Xn,t = d~^Xt (see Proposi¬ 
tion B.l below), and then to invoke Proposition 3.1. 

Remark 3.5. The tripartite classihcation in (3.6) is reflected in the different possible finite¬ 
dimensional limits X{r;{pn}) of the standardised regressor process Xn{r) := d~^x^nr\- 
Under both stationarity and mild integration, the relatively weak dependence between 
Xn[ri) and Xn(r 2 ) vanishes in the limit, and so X has the property that X(ri) and 
X{r 2 ) are independent for every ri ^ r 2 . This explains why even such an apparently mild 
equicontinuity requirement as (3.3) is unavailing for the purposes of proving Theorem 3.1. 
Only under mild integration (where dn oo) does an invariance principle operate to 
ensure that the marginals of X{r) are standard Gaussian; in the stationary case, these 
have density Up.^ The limiting X under mild integration thus corresponds to a continuous¬ 
time, standard Gaussian white noise process, which we shall denote by G. 

The strong dependence between Xn{ri) and Xn[r 2 ) that is a characteristic of local- 
to-unity processes ensures that, in this case, X^ converges weakly to the diffusion Jc (see 
(3.7) above). As c ^ —oo, the hnite-dimensional distributions of Jc converge to those of 
G, and in this sense there is continuity, in the limit, at the boundary demarcating the 
mildly integrated and local-to-unity regions. 

Remark 3.6. When {pn\ G shown that d^ ~ nu}‘^{pn)<p‘^, where 

^lip) ■■= [l-e-(^-^^)]. (3.10) 

Jo n(l-p^) 

(See Section S.l in the Supplement for the proof.) In particular, ~ <^^(1 - pI) ^ if 
{pn} e and d^ ~ ncj? ds if {pn} G 77£u' 


3.3 Weak convergence of the spatial process 

By adapting some of the arguments from Duffy (2015), and fixing r = 1, it is possible 
to strengthen the conclusion of Theorem 3.1 to weak convergence in £ucc(lK)) the space of 
bounded real-valued functions on M, equipped with the topology of uniform convergence on 
compacta. We may also allow the bandwidth to be data-dependent, as well as to depend 
on the location o G M. Let /r(a; {pn}) '■= piX-, a] {pn})- 

Assumption H. : M ^ M_|_ is continuous, with hn{a) G := for a// a G M 

w.p.a.l., where hn = o{dn) and h~^ = o{nd~^ log~‘^ n). 

Theorem 3.2. Suppose H and dgp hold, the latter with p = pn for some {pn} G TZ. Then 
for any f G BIL with Jg|/(x)x| dx < oo, 


Pnia; /, hn) ^ fh^ia) {xt - dna) p{a\ {pn}) / / 

n . . ./m 


t=l 


^Under mild integration, these assertions follow from arguments given in the proof of Proposition B.l(ii). 
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in 4cc(K). 

Remark 3.7. The result may be extended to a broader class of functions than BIL, such as 
is allowed for by Theorem 3.1 in Duffy (2015), by means of a similar bracketing argument 
as is given in the proof of that result. 

The proof of Theorem 3.2 appears in Appendix D. 

4 Conclusion 

This paper has established the validity of conventional nonparametric inferences in a pre¬ 
dictive regression, where the degree of persistence of the regressor is unknown (and possibly 
very high). The opens the way for the systematic application of nonparametric methods to 
predictive regression, in which setting they enjoy the considerable advantage of being able 
to relate series with radically different memory properties. Our work on this problem has 
necessitated the development of some new limit theory for kernel density estimators, in the 
presence of mildly integrated processes. These new results fill an important gap in the ex¬ 
isting literature, and allow for a unified treatment of these estimators in an autoregressive 
setting. 
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A Proof of uniform validity of inferences 

Here, as throughout the remainder of the Appendices (excepting Section B.l) and the 
Supplement, Assumptions DGP and SM are always maintained, even when not explicitly 
referenced. Let := en{{Pn}) ■= , where dn '■= var(xn)^'^^; we note that en{{pn}) ^ n 

for all {pn} G 

Notation. For p G (l,oo) and a function / : M ^ M, ||/||p := (/|/(x)|Pdx)^/^ and 
ll/llcxD := sup3,g]jj|/(x)|; for a random variable X, ||Y||p := (E|A|P)^/^, and ||Y||oo denotes 
the essential supremum of A. BI denotes the class of bounded and Lebesgue-integrable 
functions on M. C, Ci, etc., denote generic constants which may take on different values 
even at different places in the same proof. In keeping with the discussion of the nuisance 
parameters 7 in Section 2.1 above, any dependence of these constants on 7 is ignored 
throughout. 
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Proofs of the following auxiliary results appear in Section S.2 of the Supplement. 
Lemma A.l. Suppose {pn} G Then there exists a C < oo such that 

n 

t=i 

Lemma A. 2. For every x € M, o-^ix) = + Op(l)- 

Proof of Proposition 2.1(i). Let x € M, and {mn\ and {pn} be as in the statement of the 
proposition. By Lemma A.2, straightforward calculations yield 

in{x;mnix),hn) = [Vn{x) + 6„(3:)](1 + Op(l)) 


where Vn{x) is as in (2.10) above, and 


, . X _ hl!‘^YJt=i^hSxt-x)[mn{xt)-mn{x)] __ bn,i{x) 

On\X) — 


(4 A2 Kh^ (xt - x)) 
Under DGPi, a mean-value expansion gives 


bnM^)' 


(A.l) 


\bn,l{x)\ < hf/^\\m'JooJ2^hli^i-^') 


(A.2) 


t=l 


by Lemma A.l, since Abl(rt) := K{u)\u\ G . Also by Theorem 3.1, 


en^^'^bn, 2 {x) p{x) '.= <Xu(^J 


1/2 




'i2p{a-^x) 

if {pn} G 


ip{0) 

if {Pn} G '^MI 

(A.3) 

_/:e(i,0) 

if {Pn} G 



which is strictly positive and finite a.s. (see Remark 2.1 above); here denotes the variance 
of the stationary solution to (2.2). Together (A.1)-(A.3) yield \bn{x)\ <p hf/'^el/'^, which 
is o(l) since hn = o(n“^/^), and hence 

inix;mn{x),hn) = Vn{x)[l + Op(l)]. 

The joint limiting distribution of [tn{x;mn{x),hn)]xeX' can thus be obtained via an 
application of an appropriate martingale CLT, and the Cramer-Wold device. Consider 


/ h \ 

Mn{x) := I — I - x)ut+i. 

\^nj 

Under DGP4, Mn is a martingale with conditional variance 

,2 


(A.4) 


{Mn {x)) = —Y^ Lhn {xt -x)-^p'^ (x), 

fin ^ 


(A.5) 


t=i 
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where L{u) := K‘^{u). Furthermore, the (standardised) summands in (A.4) satisfy a con¬ 
ditional Lyapunov condition, since under DGP 4 

/, \ 2 n , 

^ I sii s -!- = °{i). 

\ J ^n'^n 

using Lemma A.l. When {pn} G ^ST r.h.s. of (A.5) is non-random, and so 

the asymptotic Gaussianity of (A.4) follows from Theorem 3.2 in Hall and Heyde (1980). 
When {pn} € 7^lu> note further that 

/ r \ 1/2 « \ 1/2 n 

(“) '^Kh^{xt-x)\Etet+iUt+i\<(Tu(—] '^Kh^{xt-x) = o{l) 
\nenj \nen/ 

by the Cauchy-Schwarz inequality. Thus, an appeal to Theorem 2.1 of Wang (2014), 
together with the preceding, ensures that for all {pn} £ 

Mn{x) ip{x), 

where ^ =d A1[0,1], independent of p-, the preceding holds jointly with (A.3). 

Finally, regarding the joint limiting distribution of the vector [Mn{x)]x£^, we note 
that, for any x ^ x' and ax, ax’ € K, 

{axMn{x) + ax'Mnix')) 

2 L ^ 

= ^ [oixKh^ {xt - x) -h ax>Kh^ {xt - x')] ^ 

t=i 

2 n 2 ^ 

= -x) + al,Lh„{xt -x')] + oixax'—'^gnixt) 

t=i t=i 

where L{u) := K‘^{u) and Pniu) ■= hn ■ Kh^{u — x) ■ Kh^{u — x'). By Lemma A.l, 

1 ^ f 

— y'lffn(xt)| <p llffnill =hn Kh„{u - x)Kh^{u - x') du = o{l), 
fin Jr 

and so the Cramer-Wold device yields 


[Mn{x)]x(z,%- [^{x)p{x)]x(zjf:, 


where \^{x)\x^ 3 p =d Ai[0,/^^], independent of [r]{x)]x£^- Since this occurs jointly with 
the convergence in (A.3), the result follows. □ 

Proof of Proposition 2.1(ii). Let x G M. We shall only give the proof that AsySz(x) = 
1 — a; the proof that AsyMaxCP(x) = 1 — a is analogous. Our arguments largely follow 
those given in the proof of Lemma 2.1 in Andrews and Cheng (2012). Let i? := {m, p) G 
^ X R =: 0. For a given = (m„, pn), we write {i?n} G 7^ to signify that {pn} G 7Z, and 


16 




J. A. DUFFY 


similarly for the sets T^s^p, and T^pu- note that for any S TZ, 

lim CPn(x;d*) = lim Pm*,p*{m*(x) GCn(x;/i„)} 
n^oo n—>-oo 

= lim ^rnl,pl{\in{x;rn*^{x),hn)\ < ^l-a/ 2 } 
n—>-oo / 

= 1 — a, (A.6) 

where we have used (2.7), (2.6), and part (i) of the proposition, with this last ensuring 
that in{x] m'!^{x), hn) -^[0,1] under {i9^}. 

In view of (2.8), there is a sequence {'&n} C 0 and a subsequence {pn} such that 

AsySz(x) = liminf CP„(x;dn) = lim CPp„(x;?9p„). (A.7) 

n—>-OD n^oo 

Since R is compact, we may choose {pn} such that {pp^} also converges to some p € R. 
Let Cn ■= n{pn — 1), and let {wn} C {pn} be a further subsequence, to be chosen below. 
Now either: 

(i) p < 1: we may choose {wn} such that pw„ < 1 for all n; or 

(ii) p = 1: then either: 

(a) Cp^ is bounded: choose {rc„} such that Cu,„ ^ c € M; or 

(b) Cp^ is unbounded: choose {rCn} such that —>■ —oo, and Cw„ < 0 for all n. 

Consistent with the preceding division, consider a new sequence {??))} satisfying either: 

(i) -!?* = 'dwk for Wk-i <n<Wk] or 

(ii) -!?* = (mu,j^, 1 + n~^Cw^) for Wk-i < n < Wk] 

as appropriate. Then for all n by construction, and either: (i) € 77.gp; 

(ii)(a) G "^LU’ (b)(b) {f^n} G ^mf Thus I'd))} G 7^ in all cases, and so 


l-a= lim CP„(x;'i9*)= lim CPu,„(x;‘t9^ ) 

n—^oo n^oo 

= lim CP^„(x;?9^„) = lim CPp„(x;?9p„) = AsySz(a;). 

n—>-oo n—)-oo 

by (A.6) and then (A.7). □ 

Proof of Proposition 2.2. Under Hq, ]E(0„ — 0)^ = n“^cr^, and so 9n is n“^/^-consistent. 
Therefore, the same arguments as were employed in the proof of Proposition 2.1(i) now 
give 

[in{x', 9n, hn)]x£P^ ~ [tn{x', 9, hn')]x£.^ T Op(l) A^[0,/^^], (■^•8) 

along every {pn} G TZ. (Note that since m = 0 under the null, the nonparametric estimator 
TTiji has no bias, and so hn = is not needed to prove (A.8).) Hence, by the 
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continuous mapping theorem, 


limsupPo,p„{F„,i > ci-a,i} = liminfPo,p„{F„p > ci-a,i} = a (A.9) 

n—>-oo n^oo 

for i £ {sum, max}, along every {pn} G The passage from (A.9) to (2.12) now follows 
exactly the same lines as the proof of Proposition 2.1(ii). □ 


B Proof of finite-dimensional convergence 


B.l Proof of Proposition 3.1 

Similarly to the proof of Theorem 2.1 in Wang and Phillips (2009a), define 

[nr\ 

— y] f[Cn{Xk,n - O)] 

k=\ 

c r 

f[cn{xk,n -a + ze)](f{z) dz, 

” k=l 

and set Pe{x) := e~^ip{e~^x). It follows from Lemma 7 in Jeganathan (2004) that, for each 
e > 0 fixed, there is a non-random 6n = o(l) such that 


Ln{r,a) := 
Ln,eir,a) := 


Ln,eir, a) 


^ ^t{Xk n 

n 

k=l 


)j f 


^ <5n ^ 0 . 


Furthermore, the arguments used by Wang and Phillips (2009a) to prove that 


lim lim E|L„(r,a) - Ln^^{r,a)\ = 0, 

£—>■0 n—>-oo 

for each o £ M, which corresponds to (5.1) in that paper, require only their Assumptions 2.1 
and 2.3, both of which are maintained here (as WPi and WP3 respectively). Finally, by WP 2 ', 


\nr\ 


n 


y ^e{Xk,r 


a) '^fdd 


k=l 


/ ipe{x — a)p{r,x) dx 

Jr 

= / <y9(x)/i(r, ex-|-a) dx =/r(r, a)-|-Op(l) 


over (r, a) £ [0,1] x M as n ^ oo and then e —>■ 0, since p is continuous a.s. 


□ 


B.2 Proof of Theorem 3.1 

{Pn} G ^LU' Proposition 7.1 in Wang and Phillips (2009b), together with the argu¬ 
ments used to prove their Proposition 7.2, establish that {xn,t} satisfies WP 2 and WP3. 
(Technically, the authors only consider sequences of the form /9„ = 1 ^ for hxed c £ M, 
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but their arguments clearly carry over to the slightly more general situation in which 
= 1 + ^ for —>■ c £ R, as permitted by Thus, in this case, the result follows 

by Proposition 3.1. 


{pn} £ "^Ml' case, we shall need the following two results, the proofs of which 

are given in Appendix C. Recall the definition of Xn,t given in (3.5) above. 

Proposition B.l. Suppose g € BL, {pn} G T^mi- 

(i) ^ EH 9{xn,t) = ^ EH ^9{xn,t) + Op(l); and 

(ii) h EH ^9ixn,t) ^ r 4 g{x)(f{x) dx. 

Proposition B.2. Suppose {pn} G "^mu Xn,t satisfies WP3 with Tnp, '■= o'({es}s<t). 

It follows immediately from Proposition B.l that for every g G BL, 

T [nrl ^ 

t=\ ^ 1,-_L 

for each (r, a) G [0,1] X R. Thus WP 2 ' holds with p{r,a) = np{a). By Proposition B.2, 
{xn,t} satisfies WP3, whence the result follows by Proposition 3.1. 



-f —1 


- a) + Opil) A r / gix - a)p{x) dx 


{pn} £ Since < 1 in this case, it follows from Theorem 1 in Wu, Huang, and 

Huang (2010), with minor modifications, that 

d 

Pn{r,a;f,hn) = — y^ Efh^{xt - dno) + Op(l). 

” t=i 

Let Pp^t and fip^t respectively denote the Lebesgue density and characteristic function of xt, 
and Pp and fip those of the stationary solution to (2.2), for p < 1. The sequence 
is uniformly integrable, since |V’p,t(^)| < IV’£(-^)I foi' all t, and G L^. 

Let tn G {1,... ,n} with tn —>• oo. Since pn is bounded away from unity, ipp^,t„W — 
ipp^{\) —)• 0 for each A G R, and thus 

lbpn,tn -Ppniloo < ^ I An (^) “ An (A) | dA + / [ | An ,4n (E I + I An (^) I ] d A 

i{|A|<A} Jm>A} 

—> 0 , 


as n ^ oo and then A —>■ oo; a similar argument yields ||pp^ — PplA 0, as pn —>■ p < 1. 
Thus 


^fhSxtr, - dna) = / f{x)pp„^tMna + hnx) dx 
Jm. 

= / /(x)pp((ina +/inx) dx + o(l) ^>Pp((Tpa) / /, 

Jr Jr 
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where we have used the fact that dn ^ Cp, the standard deviation of the stationary solution 
to (2.2). Thus 


dr 


n 


\nr\ 


- dno) rapPp{(Tpa) f = rvp{a) / /. 


□ 


t=i 


C Proofs of auxiliary results for mild integration 

C.l Preliminaries 

Under DGP, we may regard xt as a nonstationary linear process, i.e. a weighted sum of the 
underlying i.i.d. e^’s. Direct calculations show that Xt = ^^^=0 o,t^t-k^t-k, where 


kA(t—l) 

(^t,t—k — (kiji—kip) — ^ ^ P 4^k—l‘ 

1=0 


(C.l) 


Observe that this quantity does not depend on t for A: < t — 1. Accordingly, we dehne 

k 


— O^kip) — ^ ^ P 4^k—l — ^t^t—k{p) 


(C.2) 


/=o 

for 0 < /c < i — 1. 

We shall make frequent use, throughout the following, of the decomposition 
oo oo t—s 

^ ^ O't^t—k^t—k — ^ ^ ^t^t—k^t—k “1“ ^ ^ ^k^t—k — oo,s—1,£ “t“ 


t—k^t—k — 
k=0 k=t—s-\-l 


k=0 


for s G {1 ,... ,t}: here x_oo,s-i,t and Xs,t,t are independent, being J^,(-measurable 

respectively, for := crde^}*^^). For r G {s + 1,... , t — 1}, Xs,t,t further decomposes as 


Xs,t,t — ^ ^ O-t—k^k — ^ ^ (kj—k^k T ^ ^ Oit—k^k — Xs,r,t T 

k=s k=s k=r-\-l 


(C.4) 


where Xs,r,t and Xr+i,t,t are respectively and J-'*_|_]^-measurable. Taking s = 1 in (C.3), 
we have by independence that 


t—1 / k 


oo / 1—1 


Ex^, = Y,ait-k = J2{J2p'~'^n +E =--^^Ap)+v2Ap): (c.s) 


k=0 k=0 \l=0 / 

where Vi^t can be rewritten as 

t-l t-i-l 


k=t \l=0 


viAp) = Y.^'i E ^''' + 2E E E 

i=0 k=0 i=0 j=i+l fc=0 


(C.6) 
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(see Section S.3 of the Supplement for details). 

Define kn '■= kn{{pn}) to be the largest integer for which 


K{{Pn]) < 1 V 


[(1 - pn ) ^ Are ]/2 

njl 


if {Pn} e 
if {Pn} G 


(C.7) 


for each n sufficiently large. Observe that kn in both cases. Proofs of the following 
elementary results are given in Section S.3 of the Supplement. 

Lemma C.l. Suppose {pn} & ^LU- T’/ien 

(i) there exist 0 < a < a < oo and /cq, no G N with ko even, such that 


a < min \ak{pn)\ < max \ak{pn)\ < a 

ko/2<k<2kn 0<k<n 


(C.8) 


for all n > no; and 


(ii) there exists a 7 G ( 0 , 00 ) such that 


sup max 

Ti^TiQ kQl2'^k'^2kn 





*/|A|<l, 

*/|A|>l. 


Lemma C.2. Suppose {pn} G 

(i) 0 for any e > 0 ; 

(ii) (1 - Pn) ~ 2(1 -p„). 


C.2 Proofs of Propositions B.l—B.2 

We first state and prove the following auxiliary lemma, which which is the key ingredient 
in the proof of the first part of Proposition B.l. For a function g G BL, let ||p||Lip •= 
- g{y)\/\x - y\. 

Lemma C.3. For any g G BL, 


E 


'^[g{xt) -Eg{xt)] 


t=i 


< 


IbllLip ^ ^ 

k=0 \t=l 


1/2 


^Xt-k 


< 




(C.9) 


where the second inequality holds if \p\ < 1. 
Proof. Let Et[-] := E[- | Decompose 


n 00 n 00 

Qn ■= X][5(^*) “ ^aixt)] = ^^[E4_fcC/(xt) - Ep_i)_fc5(xt)] =: J^Mnk- (C.io) 

t=l k=0 t=l k=0 
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Clearly, by the orthogonality of martingale differences, 

n 

. (C.ll) 

t=i 

Now by the decomposition (C.3), 

k—1 oo 

^ ^ “t“ ^ ^ 

s=0 s=k-\-l 

k—1 oo 

—d ^ ^ “t“ dtd—k^ “1“ ^ ^ 5 

s=0 s=/c+l 

where e* =d is independent of {^t}, and hence also of Tt-k- Thus ^(t-i)-k9{^t) = 
Et-kgix^), whence 


\Et-kg{xt) - E(t_i)_fc 5 (xt)| = \Et-k[g{xt) - g{x*)]\ < \\g\\upat,t-kEt-k\£t-k - £* 
Hence, by (C.ll) and Jensen’s inequality, and recalling that cr^ = 1, 

EM^^i, < ‘2\\g\\1ipaf^-i._k, 


which together with (C.IO) yields the first inequality in (C.9). 
For the second inequality, we recall from (C.l) that 

n—l 

fc| ^ ^ ^ IpI H — ^n,fc 

1=0 


for 1 < f < n, with the convention that (/>_/ := 0 for / < 0. Hence if \p\ < 1, 

OO / n \ 1/2 OO 


n—l oo 




n 


k=o \t=i 


k=0 


1=0 k=0 


i/2^gM 

1 -H ■ 


□ 


Proof of Proposition B.l(i). We take r = 1 for simplicity; the proof for fixed r G [0,1) is 
analogous. When p G (0,1), applying Lemma C.3 to the unstandardised process {x*} gives 
the bound 


E 


Yiaixt) -Eg{xt)] 

t=i 


— lll/llLip 


,^i/2 Er=oi4i 

1 -p 


(C.12) 


It follows that replacing xt by the rescaled process Xn,t = dy^^xt in (C.12) gives 


E 


1 "■ 

-Y^9{Xn,t) -Eg{Xn,t)] 

^ t=l 


< 1 . 


n 


1/2 


H l/n(l Pn) 



1 -Pn [n(l-p„)]V 2 


o(l)> 
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where we have used Lemma C.2(ii) and the fact that x (1 —^ (see Remark 3.6). □ 

Proof of Proposition B.l(ii). Let e > 0. It is proved below that along every sequence 
{tn} C [ne,n], 

Xn,tr.-~^N[0,1], (C.13) 

whence E,g{xn,tn) ^id) ■— fm3(^)p(^) since g is bounded. Also by boundedness, 


^ [nrj 

n ^ 


t{9)] 


< elldlloo + sup \Eg{xn,t) - T{g)\ ^ e\\g\\oo- 

t £[ ne , n ] 


Since e was arbitrary, the result follows. 

It remains to prove (C.13). To that end, decompose 


t—1 OO 

^ ^ “1“ ^ ^ ^t^t—k^t—k — oo,0,i — ‘ 

k=0 k=t 

Note that and are independent, with variances respectively given by Vi^t and 
in (C.5) above. We shall prove below that for t = tn ^ [ne, n], p = pn and := (1 — p^)~^, 

r"^var(xt„) = r"Vi,t^(pn) + o(l) ^ . (C.14) 

Thus var(xt^) ~ ~ nu)^{pn)4>^, where is dehned in (3.10) above. Further, we may 

write = Ylk=-ooKk^k, where 

^n,k — A 

[0 otherwise. 

Since pji ^ 1 it follows that for all n sufficiently large 

OO 

max|(5„,fc| < a \at^,k\ < (1 - = o(l)- (C.15) 

2 = 0 

Together, (C.14) and (C.15) permit the application of Lemma 2.1(i) in Abadir, Distaso, 
Giraitis, and Koul (2014), yielding 

Xn,t„ = var(xt„)"^/2^i^ ^ + o(l) A^[0,1]. 

We turn lastly to the proof of (C.14). From (C.6), and the fact that p„ € (Ojl)) we 
have 


tn 1 in 1 in 1 

(1 - pDvum = E +2 E E 

2=0 2=0 j=i-\-l 
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Also by Lemma C.2, < 1 for all 0 < z < ~ Ij and pn*" ^ 0 for 

each fixed z > 0. Thus, in view of ^ 

tfi 1 tn 1 tn 1 

(1 - Pn)^l,tn(Pn) ='^^i+‘^Yl Y1 ^ 

i=0 i=0 j=i+l 

Regarding we note from (C.5) that 

OO tfi 1 tfi 1 

V2t„(Pn) Pn\<Pk-l\ < Y PnY-h 

k=tn 1=0 1=0 

where := Finally, 

tri —1 /\tnl‘l\—\ tn — 1 \ \tn/2\ 

Y Pn^^r.-l = + X] ]pnY-l ^ (<^LZn/2j +Pn"^‘^^) Y P^^ = 

/=0 \ 1=0 i=[t„/2j/ /=0 

using that <^[ 4 ^/ 2 ] ^ 0 and —>■ 0 (by Lemma C.2(i)), and 

LZn/ 2 j 

Y pL < (1 - Pn)~^ - (1 - pI)~^ = rn 

1=0 

by Lemma C.2(ii), whence V 2 z„(Pn) = o[rn). □ 

Proof of Proposition B.2. We take dn,s,t = 1 for all n, s and t. Then part (a) of WP3 is 
trivially satisfied. For part (b), recall the decomposition in (C.3), 

Z—s —1 OO 

^ ^ T ^ ^ ®Z,t—s 

k=0 k=t—s 

t—s—1 00 

— ^ ^ Oik^t—k T ^ ^ (kt^t—sCt—s — Xs+l,t,t T oo,s,Z) 

k=0 k=t—s 

for 1 < s < t < n, so that 


Xg — T (^—oo,s,Z ®s) —• ^s+l,Z,Z T Usjt- 


where Xs+i,t,t is independent of ys,t- Noting Xs+i,t,t =d xi^t-s,t-s and taking r := t — s, we 
thus have 


Xn,t Xn,s —d 3^1,r,r T yn,s,t 


where —d dn^ys,t is independent of xi^r,r- 

In view of the definition of n„(p), part (b) of WP3 only concerns s and t for which 
(1 — 5)n >t — s = r = rn>n5 for some 5 € (0,1). For such r„, it follows from arguments 
given in the proof of Proposition B.l(ii) that Zn ■= d~^xi^r„,rn -N[0,1]. Letting /„ 
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denote the characteristic function of Zn, arguments given in the proof of Corollary 2.2 in 
Wang and Phillips (2009a) then imply that part (b) holds if the sequence {fn} is uniformly 
integrable. In order to prove this, we first observe that 

l/n(A)| = |]Eexp(iA 2 ;„)| < ak{pn)\\ 

s=ko 


whenever n is sufficiently large that tn > n6 > kn- Hence for any H > 0, 

kn 

/ |/n(A)|dA</ T\\'llJe[Xd~^ak{pn)]\d\, 


— dr] 




\'lpe[Xak{pn)]\ dX 


—(J dn 


dA + / |V-.[AafcJp„)]| dA 




J{\X\>Ad-^} 

[ dX + dne-^^^" f |V’s(A)|dA 

'{|A|>A} Jr 


as n —>■ oo and then H —>■ 0, where we have used Lemma C.l and the fact that kn 
Thus {fn} is uniformly integrable. □ 


D Proof of weak convergence (Theorem 3.2) 

{pn} C "^Mi ’^LU" case, the proof of Theorem 3.2 closely follows the proof 

of Theorem 3.1(i) in Duffy (2015), as outlined in Section 4 of that paper. In particular, 
appropriate analogues of two key intermediate results - Propositions 4.1 and 4.2 in Duffy 
(2015), reproduced below as Proposition D.l - are proved below by means of a martingale 
decomposition similar to that developed in Section 7.1 of that paper. To this end, let 
Et[-] := E[- I /iioo], and consider 

t/xkfi 

f{xt) = ^[Et_fc+i/(xt) -Et_kf{xt)] +'E[t-kn]+f{xt), 

k=l 

where [a]+ := a V 0: note that unlike Duffy (2015, Sec. 7.1), the decomposition here is 
truncated at [t — kn]+ rather than at 0. Defining 

f,k,tf ■= Etf{xt+k) - Et-if{xt+k) (D.l) 


we have 
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n n kn — ln 

Snf ■■= Y. = Y + +Y [Et-kfixt] - Et-k-if{xt)] 

t=l t=l k=0 t=k+l 

k'fi 17T .—k kfi 1 

= J^nf + EE [Etf{xt+k) - Et.if{xt+k)] =Mnf+Y (D. 2 ) 

A:=0 t=l k=0 

where Afnf := Ylt=i'^[t-k„]+fixt) and Mn,kf ■= Ylt=i^k,tf- {^k,tf,J^-oo}t=i forms a 
martingale difference sequence for each k by construction, and so control over each Mnkf 
will be deduced from control over 

n—k n—k 

Un,kf ■■= [Mn,kf] = Y ^r^,kf ■= {Mn,kf) = Y (D-3) 

4=1 i=l 

To state our bounds on the foregoing, we first define the norm 

||/||[^] := inf{c G M+ | |/(A)| < c|A|^ VA G R}, (D.4) 

for /3 G (0,1], where /(A) := J e^^^f{x)dx denotes the Fourier transform of /. (See 
Section 4.2 and Lemma 9.1 in Duffy (2015) for more details on ||/||[/ 3 ].) Let BIj^] := {/ G 
BI|||/||[/i]<oo}, 

?n(^, /) := ll/lloo + end;;^(||/||i + ||/||[/3]) 


and 


■= 


+ 6?^ 


^-(3+2/3)/2|| .||2 , 


if fc G {0,..., A:o - 1}, 
if A: G {ko, ...,kn-l}-, 


and for ^ C Blrai, 


<5n(/3, ■— ll^^lloo + ey^||^||2 + endri^dl^lll + ||=^||[/3]), 

where ||^|| :=supjg^||/||. Define 

^ := Piipn}) ■= sup{^ G (0,1] I = o(l)}. 


(D.5) 


Since dn < < e„, P{{pn}) > ^ for all {pn} £ "^mi ^LU- Proofs of the following 

results appear in Section S.4 of the Supplement. 

Lemma D.l. Suppose {pn} G P' "^lu ^ (0,/?)• Then there exists a C < oo 

such that 

||A/'„/||oo <(- ?n(^,/) (D.6) 

and 

\\ldn,kf\\r, V ||V„,fc/||., CT^ ,(/3, /) (D.7) 
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for all n > no, 0 < k < kn — 1 and f € BI. 

Lemma D.2. Suppose {pn} S ^mi ^LU ^ (0)/3)- Then there exists a C < oo 

such that 

hn 1 

SUp?n(/3,/)+ 'V] SUpCJn,fc(^,/) <C5n{l3,^) 

for all ‘IS C BI[^]. 

With the preceding lemmas taking the places of Lemmas 7.3 and 7.5 in Duffy (2015), 
the next result follows from almost exactly the same arguments as are used to prove 
Propositions 4.1 and 4.2 in that paper. The very minor modifications that are required 
merely reflect the slight differences between an,k and bn as they appear above, and the 
corresponding quantities in Duffy (2015); the reader is accordingly referred to that paper 
for the details of the proof. Let k{x) := (1 — |x|)l{x G [—1,1]} denote the triangular kernel 
function, and /r„(a;/) := /i„(l, a;/, 1). As in Duffy (2015, Sec. 4.2), ll-Urg/a denotes the 
Orlicz norm associated to the convex and increasing function 


T3/2{x) 


x{e — 1) if a: G [0,1], 
e^ ^ — 1 if X G (1, oo). 


Proposition D.l. Suppose {pn} S TZ^j U (3 G (0,/3). Then 

(i) there exists a C < oo such that 


sup l|/r„(ai;K) -/in(a 2 ;«^)||r 2/3 <^ 101 - 02 !^; 
ai,a2eM 


(ii) if c BI[^] with , then max/^j^^ |<S„/| <p (5„(/3, ^„) log n, whence 

max|5„/| = 0 ( 1 ) 

J 

ifW^nWi < 1, ||=^n||[/3] = o{d?i,) and ||=^n||oo = o(enlog“^re). 

The proof of Theorem 3.2 may now proceed almost exactly along the lines of the 
proof of Theorem 3.1(i) in Duffy (2015), with Proposition D.l here playing the role of 
Propositions 4.1 and 4.2 there. Let M < 00 ; the desired convergence in .^ucc(l^) will follow 
from convergence in the space of bounded functions on [—M, M], equipped 

with the topology of uniform convergence. As per the argument in Section 6 of Duffy 
(2015), it follows immediately from part (i) of Proposition D.l that pn{o-]iT) is tight in 
^oo{[—M,M]), whence /rji(a; k) p{a) in .^oo([—M, M]). Further, for any / as in the 
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statement of Theorem 3.2, 


sup 

ae[-M,M] 




< sup 


Hn{a-J,h) - nia;K) / / 

Jr 


= Op(l) 


where the inequality holds w.p.a.l under H, while the equality follows from part (ii) of 
Proposition D.l, together with (6.2)-(6.3) in Duffy (2015) and the subsequent arguments. 
Thus Hn{a-, f,hn{a)) ^(a) in ioo{[-M,M]). 

{Pn} £ case, the result follows from essentially the same arguments as are 

used to prove Theorem 2 in Wu, Huang, and Huang (2010). □ 
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S Supplementary material 

S.l Verification of Remark 3.6 

The claim to be proved is that, when {pn\ ^ ■— var(xn) ~ nuj'^{pn)(i>^, for ujn 

as in (3.10). When {pn} G ^md follows from (C.14) with tn = n. We therefore suppose 
that {pn} G ^LU- case, Cn := 'n{pn — 1) — >■ c G M and uj^{pn) —> ds. 

Define 



(S.l) 


for k G {1,... ,n}, with the final equality holding by continuity when p = 1. Recall from 
(C.5)-(C.6) above that Ex^ = Vi,n(p) + V 2 ,n(p)) where 


n—1 


n—1 n—1 



(S.2) 


2=0 


2=0 j=i-\-l 


If we can show that 


(i) ^gn-i{pn) fo ^^‘^ds as n —>■ oo, for each fixed i > 0; and 

(ii) maxi<k<n^lffk(Pn)l is uniformly bounded; 

then in view of ^ (S-2), it will follow immediately that 



as n —>■ oo, whence Vi,n(Pn) ~ 

For (i), we first suppose that Cn ^ c 0. Then 





To handle the case where Cn —>■ 0, we note hrst that = 1 + x + o(x) as (y, x) —)• (e, 0). 
Hence 



For (ii), we note from (S.l) that \gk{p)\ < k — 1 whenever p < 1, while if p > 1, |pfc(p)| is 
maximised by taking k = n, and so boundedness follows from (i) with f = 0. 

It remains to show that V 2 ,n(Pn) = o(n). Taking n sufficiently large, p^ > 0 and so 



V2,n{pn) = '^{'^Pn(t>k-l\ ^ X] ( Z] I ) 


OO /n—1 \ ^ oo /n—1 \ ^ oo ri—1 


k=n \l=0 / k=n \l=0 / k=n 1=0 
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Finally, 

oo n—1 / 2n oo \ n—1 n oo oo 

k=n 1=0 \k=n k=2n-\-l/ /=0 k=0 l=k k=n 

S.2 Proofs of Lemmas A.l—A.2 

Proof of Lemma A.l. Suppose {pn} G ^ST- Then Fourier inversion and the decomposition 
Xt = (foEt + X-oo,t-i,t gives, for any positive-valued f ^ L^, 

IE/(xt) I\f{X)\\M-M \dA < </>o'lliA.Ilill/lli ll/lli (S.3) 

since (j)Q ^ 0. This, together with the fact that x n, yields the result in this case. 

Now suppose {pn} S ^mi' d ^LU' ^dis case, (S.3) continues to hold, and an appeal 
to Lemma S.2(i) below yields 

, n / k„ \ 

— ^E/(xf) — fco + J]] + (n - ||/||i. 

t=\ \ t=fco / 

The bracketed term on the r.h.s. has the same order as 

= e-^dn + 1 < 1 


since, in particular, nkn^^"^ x = e^. □ 

Proof of Lemma A.2. The proof follows the same lines as the proof of Theorem 3.2 in 
Wang and Phillips (2009b). We first note that 


n 

Y^h^xt - x)[yt+i - rhn{x)]^ 

t=l 

n n 

= Y^hr^ixt - x)uY +Y^h„{xt - x)[mn{xt) - mn{x)]ut+l 

t=l t=l 

n 

+ Y^hAxt - x)[mnixt) - rhn{x)f 

t=l 

=• ^n,l + ^n,2 + ^71,3- 


Now, letting (t := — cr'^, we claim that 


A 


n,l 


n 1 ^ 

Y {xt-x) + —Y ^hn {xt - x)Ct+l 


2 n 

t=l 


(S.4) 


t=i 


(^Ivix) 


for p{x) as defined in (A.3) above. The convergence of the first r.h.s. term in (S.4) follows 
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from Theorem 3.1. For the second term, we note that 

V 

x)Ct+i 1 

1 ” c 

\ Gt]] < = o(i) 

by Lemma A.l and the a.s. boundedness of sup^ | Gt], where L(u) := K'^{u). 

Next, note that for any x G M, |m,i(x) — mn{x)\ = Op(l) follows from arguments similar 
to those used in the proof of Proposition 2.1(i). Thus 

A 1 ^ 

<c — '^Kh^{xt -x){[mn{xt) -mn{x)f + [mnix) -mn{x)f} 

/}2 ^ 

< ll"inllL—-^) +Op(l) 

t=l 

= Op(l), 

by a mean-value expansion and Lemma A.l; here := K{u)v?‘. Finally, by the 

Cauchy-Schwarz inequality, 

^n,2 < 

and so by Theorem 3.1 and the preceding. 


/ n 

V t=i 


^l{x) 


^n,l T -^11,2 T ^n,3 ^u^l^x') 

llt=i^hAxt-x) r]{x) 


□ 


S.3 Proofs of auxiliary results from Appendix C 

Proof of (C.6). Letting m := t — 1, we have that Li,t(p) is equal to 


m k k mm m 

k=0 i=0 j=0 i=0 j=0 k=i\/j 

mm mm m 

= E E + 2 E E E 

2=0 k=i 2=0 j=i-\-l k=j 

m m—i m m f^—j 

= E<^?E^'" + 2E E 

i=0 k=0 i=0 j=i+l k=0 


□ 


Proof of Lemma C.l(i). When {pn} G ^lU’ result follows essentially from arguments 
given in Wang and Phillips (2009b): see their (7.14), in particular. We therefore turn 
to the case where {pn} £ ^mf Then pn G (0,1), and the upper bound in (C.8) follows 
trivially from \ak{pn)\ < Further, for any 0 < /c < 2kn, 


„2fc„ 

Pn 


<Pn< Pn’" < Pn'^’"" 
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Noting that ^ as p —>■ 1, and 2kn ~ {l—pn)~^, it follows that {p"^", Pn^^") 

(e“^, e) as n ^ oo. Thus there exists an no and Ci,C 2 G ( 0 , oo) such that p^, p~^ G [Cl-, ^ 2 ] 
for all n > riQ and 0 < k < 2kn- 

Now ak{pn) = Pn ELo Pn’'(t>h and for any m < k < 2kn, 


k m m k 

1=0 1=0 1=0 l=m-^l 


Therefore, since \p^\ < 1, 

m 

O'kiPn) ~ Pn ^ ^ ' 


/=0 


<^\i-p-^\\^,\+ Y1 \^i\ 

1=0 l=m-\-l 


Let mo be chosen such that both 


mo 

Y^i 

1=0 


>Ci 


E' 

1=0 


mo ^ 

> Yu\ =: 3a 


for all n > no, and — — Since p^^ ^ 1 for each I, there exists an ni > no 

such that 


(kk{pn) I ^ Pn 




1=0 


Ell 


1=0 


pJUi\ 


k 

Y - - 

/=mo+l 


for all n > ni- Taking k^ := 2mo and re-designating ni as no completes the proof. □ 

Proof of Lemma C.l(ii). Since G L^, eo has a bounded continuous density. Thus by the 
Riemann-Lebesgue lemma (Feller, 1971, Lem. XV.3.3) limsup|;^|^oo|'^/le(A)| = 0. Further, 
ife G cannot be periodic, and so |i/’£(A)| < 1 for all A 7 ^ 0 (Feller, 1971, Lem. XV. 1.4). 

Since is necessarily continuous (Feller, 1971, Lem. XV.1.1), it follows from part (i) 
of the lemma that 

sup sup sup |'0£[Ofc(/9n)A]| < 6“"^° 
n>no kci<k<kn |A|>1 

for some 70 G (0, 00 ). By the moments theorem for characteristic functions (Feller, 1971, 
Lem. XV.4.2), we have the expansion ipsiX) = 1 — + where r(A) —)■ 0 as A —)■ 0. 

In view of part (i) of the lemma, there thus exists a 71 G (0, 00 ) such that 


sup sup \f}e[ak{Pn)X]\ < e 

n>no ko<k<kn 

for all |A| < 1. The result now follows by taking 7 := 70 A 71 . □ 

Proof of Lemma C.2. Letting Cn '■= n{pn — 1) —>■ — 00 , we note that for every M < 00 , we 
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may take n sufficiently large such that Cn < —M, whence 


Pn = 1 + 


n 


< ( 1 -^ e“™" ^ 0 

n / 


as n —)■ oo and then M ^ oo. Thus (i) holds, (ii) follows from 


1 -Pn 
1 - Pn 

S.4 Proofs of Lemmas D.l—D.2 


— 1 + —>■ 2 . 


□ 


The proof of Lemma D.l requires the following two results, which here play the role of 
Lemmas 7.4 and 9.3 in Duffy (2015); see Section Section S.5 for the proofs. 

Lemma S.l. Suppose (3 € (0,/3). Then there exists a C < oo such that 


n—k—t 


s=l 


Ultf\\oo+ Y. \\^tfk,t+sf\\oo<c<k{PJ) 
when k G { 0 ,..., kg — 1 }, and 

UltfW- 


(S.5) 


{n—k—t)/\kn 

Y W^t^lt+sfWoo <c n~^knal ,,il3, /) 

S=1 


(S. 6 ) 


when k € {kQ,... ,kn — 1}, for all n > no, 1 < t < n — A; and f G Bfj^j. 

Lemma S.2. Suppose / G Bf. There exists a C < oo, not depending on f, such that 

(i) for every t > 0 and k^ < k < n — t, 

^t\fixt+k)\ <(j (/c A/c„)"^/2||/||i; 

(ii) if in addition f G Bfj^] for some (3 G (0,1], then for every t > 0 and ko < k < n — t, 

\^tf{xt+k)\ <c (A: Afc,,)-(i+/')/2||/||[^] 


1 - 


Proof of Lemma D.l. By Lemma S.2(ii), 


' ko — 1 kn n 

Wnf\ < I E + E + E I 

t=l t=ko t=/Cn + l> 


^ II./ Iloo 




whence (D. 6 ), noting that ne = Cndn^. Regarding (D.7), 

in view of Lemma 7.2 in Duffy (2015) it suffices to show 


\Pn,kf\\p V \\Vn,kf\\p <c P'-^^^(xlk{l3, f) 

S5 
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for every p £ N. To prove (S.7), consider decomposing Vn,k into L blocks, as per 

L Ti L 

Vn,kf = E E ^t-iek,tf =-^Vn,k,lf 
1=1 t=n-i 1=1 

for endpoints 0 < tq < • • • < tx, < n — k. For the Th block, repeated application of the 
law of iterated expectations yields 


nvn,k,if\^<p\- E ••• E 

tl=Ti_i-\-l tp—l=tp — 2 

Tl tp — 

II^Mp_l/lloo + E 


_i+sl|oo 


; (S.8) 


S = 1 


Since suitable bounds for the final term (in 

parentheses) are provided by Lemma S.l. In particular, when k £ {0,... , feo — 1}, we may 
take To = 0 and ti = n — k, so that (S.5) immediately yields 


E|V.,fc/P<p!-CV2^(/3,/). 


When k £ {ko,... ,kn — 1}, we set p := k^l A {n — k), with L = Ln chosen to be the 
smallest integer such that knLn > n — k. Then applying (S.6) to (S.8) gives 

nVn,k,ifr < p'- ■ CPin-^knTal^^j^iP, f) 


for I £ {1,..., L}, and 


L 

1=1 

since < nk~^. Thus Vn,kf satisfies (S.7); an analogous argument establishes that this 
is also true of Un,kf ■ El 

Proof of Lemma D.2. This follows exactly as per the proof of Lemma 7.5 in Duffy (2015): 
we need only to note that, in the present case. 


1/2 ,-(3+2/3)/4 < „1/2l 1/4 L-1-/3/2 < 1/2 ,l/2-/3 < ,-/3 


k=0 


k=0 


since kn^ x dl/'^ < el/'^. 


□ 
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S.5 Proofs of Lemmas S.l—S.2 

We first recall the following useful inequality, from Lemma 9.1(i) in Duffy (2015): 

I/'(A)|<(|A|'’||/||[^,)a||/||i (S.9) 

for every (3 G (0,1] and / G recall / denotes the Fourier transform of /. We shall 

also need the following results, whose proofs appear at the end of this section. Let 

^(zi,Z 2 ) := - Ee“'^ 2 eo]_ 

Lemma S.3. There exists a C < oo such that for every zi,Z 2 G M, 

|^(^l,^ 2 )|<^ [|zipAl]V 2 [|^ 2 pAl]V 2 . 

Lemma S.4. There exists a 71 > 0 and a C < 00 such that, for every p G [0,5], zi,Z 2 G 
M_l_, and ko < k < 2kn, 

f (^i|A|P A Z 2 ) JT|T/'[ai(/9n)A]| dA Z2e~'^^^ 

l&K 

uniformly over all 1C 'G {[k/2\,... ,k} with ffK, > /c/4. 

Corollary S.l. There exists a 71 > 0 and a C < 00 such that 

(i) for every p G [0,5], zi,Z 2 G M+, k^ <k <kn and 1 <t <n — k, 

[ (zi|A|P A Z 2 )|Ee-^^^‘+i'*+'''*+''| dA zi{k A 

Jr 

(ii) and additionally, for every 2 < s <t, 

f dA (s A /c„)“^/^. 

J M 

Proof of Lemma S.l. The argument is similar to that used to prove Lemma 7.4 in Duffy 
(2015). We first suppose that /c G {0 ,...,/cq — 1}. Trivially, ||^^^/||oo ll/llLi while by 
Jensen’s inequality and Lemma S.2(i), 


l^tfk ^i+sl —Q ^tf (^^t+s+fc) —Cl 


2 

00 


if 1 < s < /co — 1, 


XsAkn) if s > /cq. 

Hence, noting that s~^/‘^ < kl/"^ < n/cn and nkn^^"^ x ndf^ = e„. 


n—k—t 




00 <e II/IIL +^^n^''^ll/ll2 <Ci ll/lloo + en|U||2, 


s=l 
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as required for (S.5). 

It remains to consider the case where k S {feo,..., — 1}. We shall obtain a bound 

for Kt-sCk tf ^ ^ ^ {^5 ... ,n — k} and s € { 1 ,... ,kn /\ [t — 1 )} - which depends on k 

and s but not t, thus permitting us to deduce the required bound for ^tCkt+sf ™ (S.6). 
As per (C.3) and (C.4) above, decompose 

^t+k — X—oo,0,t+k “ 1 “ Xi^t—l,t+k + Xt-^-l^t+k,t+k^ 


so that by Fourier inversion 

4,t/ = ^tf{xt+k) - ^t-if{xt+k) 

whence 


/ —oo,0,t+fcg l,£ + fc 

^-iXa^et _ jgg-iAafeet 

Jr 

- 


^^-iXxt+i,t+k,t+k 




^—iXiaket _ ]gg—iAiaj,et 


^—iX2aket _ ]gg—iA 2 afc£t 


. ^^-iXiXt+i^t+k,t+k'^^-i^2Xt+i,t+k,t+k dAi dA2. 


Since 1 < s < t — 1, making the further decomposition 


Xl^t—l,t+k s,t+fc “1“ s+l,t—l,t+fc 

and taking conditional expectations on both sides of (S.IO) gives 

Et-sfktf = ttAi ff /(Ai)/(A2)e-'(^i+^2)"— 

X']r 2 

. Ee-i(Ai+A2K_,+i,t_i,t+fe . ^^Xiak, X2ak) 

. ^^-iXiXt+i,t+k,t+k'\^^-i^2Xt+i,t+k,t+k dAi dA2, 

where '&{zi, Z 2 ) '■= E[e“'^i'^° — ]Ee“^^i^°][e“'^2'^° —Ee“'^2'^°]. Thus by (S.9) and Lemmas C.l(i) 
and S.3, there exist C,Ci <00 such that 

Ei-.d,t/<c [[ |/(Ai)/(A2)|[|Ai|2a1]V2[|A2|2a1]V2 
JJ 

* |]E 0 s+i,t—i,t+fc I 

• I I dAi dA 2 

/'|/(Ai)|2(|AiP Al)|Ee-'^i"^‘+i.‘+''.‘+''| (S.ll) 

Jr 

|]Ee-iAi+A2K_.+i,t_i,t+fe I 


-Cl 


/i 
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where we have used |a6| < |ap + |6|^, and appealed to symmetry (in Ai and A 2 ) to obtain 
the final bound. Now by Corollary S.l(ii), and recalling that s < kn, 

J M. >/ M 

(S.12) 

while (S.9) and Corollary S.l(i) give 


/ l/(A)P(|Ap A l)|Ee-'^^*+i-*+'“’*+'“| dA 

Jr 

< ^[(|Ap('+^)||/||f^]) A ||/||2]|Ee-"-*+u*+M+.|dA 

fc-(3+2«/2||/||2^j+e-'>l"||/||?. 

Together, (S.11)-(S.13) yield 

<c +e-^^'=11/11?), 

which does not depend on t, and thus applies also to ||Et^^ Hence 


(S.13) 


(n—k—t)/\kn 


Y. P^i+Jloo <0 +e-^^'=||/||?). (S.14) 


S = 1 


We come hnally to t/||oo- Returning to (S.IO), we have by (S.9) and Corollary S.l(i) 


that 


l|{U/ll«o<c ( / l/(A)||Ee-‘^<+.«M«|dA 


<c, (/_^[(|A|'’ll/llM)All/llil|Ee-'A*:+..*+*.w|<iA 
% (t-l‘+W 2 ||/|||^l+e-A*||/||, 


+e-A‘‘ 


-C 3 11 -^ ll[/3| ' 11 -' 111 


(S.15) 


where the final bound follows because k < kn- The result now follows from (S.14) and 
(S.15), and the fact that 

= irr^kn)nk~^l‘^ x {jT^knYn- □ 

Proof of Lemma S.2. Exactly as in the proof of Lemma 9.3 in Duffy (2015), for / € BI 


^t\f{xt+k)\ <c ll/lli / dA, 
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while for / S BI[^], 

\Etf{xt+k)\<c [ [(||/||[/3]|A|0Al]|Ee-""‘+i.‘+M+;c|dA, 

J M 

whereupon both parts follow by Corollary S.l(i). □ 

Proof of Lemma S.3. Exactly as in the proof of Lemma 9.4 in Duffy (2015), 

- Ee"^^^°p E[|Aeo|^ A 1] |Ap 


since Esq < oo. The result now follows by noting that the l.h.s. is also bounded by 4, and 
applying the Cauchy-Schwarz inequality. □ 


Proof of Lemma S.4- Let h{X) := 2:1 |A|^ A Z 2 and K := ffK. By Holder’s inequality, 

\i/K 

IgK l&K ' 


[ h{>')T\\'4’s[aiiPn)X]\dX <T\( [ /i(A)|V'£[az(p„)A]|'^dA^ 

Jr ,^r-\JR J 


< max / /i(A)|V’£[az(pn)A]|^dA 

Jr 

< [ h{X) max |V’£[ai(p„)A]|'^ dA. 

Jr ko/2<l<2kn 


Further, by Lemma C.l(ii), the preceding is bounded by 

zi [ |A|^e“'^'^^'^ dA + Z 2 e~'^^\\'4>s\\i + Z 2 e~^^^■ 

Jr 

Since K > /c/4, the result follows. 

Proof of Corollary S.l. Since 


k-l 

'^t+l,t+k,t+k — ^ ^ OjCt+k—l 
1=0 


k+s—1 

l=k+l 


we have 


k/xkfi — 1 
1= \ {k/\kn) I2\ +1 

and so part (i) follows immediately from Lemma S.4. For part (ii), we note that 


□ 


k—l + sf\kn 

l=k+\ 


and so, when s > /cq, the required bound also follows from Lemma S.4. When s < /cq, 
the crude bound |Ee“^'^’^‘-'’+i’*-i’*+'“| < |i/’£(afc+i(pn)A)| suffices, in view of i/e G L^ and 
Lemma C.l(i). □ 
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S.6 Index of notation 
Greek and Roman symbols 

Listed in (Roman) alphabetical order. Greek symbols are listed according to their English 
names: thus w, as ‘omega’, appears before 6, as ‘theta’. 

at^t-k coefficient sequence . (C.l) 

Ofc equals at^t-k for 0<A:<f — 1 . (C-2) 

AsySz asymptotic size. (2-8) 

AsyMaxCP asymptotic maximum coverage probability . (2-9) 

Bf bounded and integrable functions on M . App. A 

BI[/ 3 ] / e Bf with ||/||[, 3 ] < oo . (D.4) 

BL bounded and Lipschitz functions on M . Sec. 1 

BIL bounded, integrable and Lipschitz functions on M. Sec. 1 

C, Cl generic constants . App. A 

Cn confidence set . ( 2 - 6 ) 

CPn coverage probability . (2-7) 

dn equals var(xn)^/^ . (3-5) 

6ni(3,^) appears in Prop. D.l(ii) . (D.5) 

Et innovation sequence . DGP 2 

Cn norming sequence, equals nd~^ . App. A 

Ej expectation conditional on . App. C.2 

r] mixing variate in limiting variance . (A.3) 

/ Fourier transform of / . App. D 

Fn^i non-predictability test statistic . ( 2 - 11 ) 

Fi a{{er}i=s) . App. C.l 

Qt rr('(x 5 , rig}s<i) . DGP4 

7 nuisance parameters (i/jg, {(/)fc}, ( 7 ^, . Sec. 2.1 

P parameter space for 7 . Sec. 2.1 

h, hn bandwidth . (2-3) 

hn upper and lower bounds defining J^n . H 

J^n set of allowable bandwidths . H 

kn real sequence related to Pn . (C'-7) 

K, Kh smoothing kernel, Kh{x) := h~^K{h~^u) . (2-3) 

Jc standardised OU process . (3-7) 

^ucc(I^) bounded functions with ucc topology . Sec. 3.3 

Lebesgue p-integrable functions on R . Sec. 1 

Cc local time of Jc . (3-6) 
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Lip 

m 

rhn 

Mn,kf 

Mnf 

</> 

4>k 

<f 

A 

p 

R 

7^ 

7^, 


ST 


7^ 

7^ 

*r 

O'- 


MI 


LU 


tn 

e 

On 

Ut 

l^n,kf 

Vn 

Vt 

Vi,t 

Vn,kf 

W 


Lipschitz continuous functions on M . 

regression function . 

local level estimate of m . 

class of allowable regression functions . 

martingale components in decomposition of Snf 

limiting spatial density under 7Z . 

generic limiting spatial density . 

spatial density estimate . 

remainder from decomposition of Snf . 

limiting stationary density . 

scaling factor. 

sum of the (/>fc’s . 

coefficients defining the linear process vt . 

standard Gaussian density . 

characteristic function of Eq . 

autoregressive parameter . 

parameter space for p . 

T^grp U U . 

stationary sequences in R . 

mildly integrated sequences in R . 

local-to-unity sequences in R . 

asymptotic variance estimator . 

summation operator, Snf '■= Yl't=i fi^t) . 

stationary variance at p < 1 . 

conditional variance of tit . 

estimate of cr^ . 

t-statistic . 

hypothesised value of m{x) . 

sample mean oi yt . 

regression disturbance . 

squared variation of A4n,kf . 

martingale component of . 

linear process built from {et} . 

variance components . 

conditional variance of Ain,kf . 

standard Brownian motion . 


Sec. 1 

( 2 . 1 ) 

(2.3) 

DGPl 

(D.2) 

(3.6) 

(3.1) 

(3.8) 

(D.2) 

(3.6) 

(3.10) 

DGP3 

( 2 . 2 ) 

(3.6) 

DGP2 
( 2 . 2 ) 
DGP3 
Sec. 2.2 
Sec. 2.2 
Sec. 2.2 
Sec. 2.2 
(2.5) 
(D.2) 
App. B.2 

DGP4 

(2.5) 

(2.4) 

Sec. 2.1 
Sec. 2.3 
( 2 . 1 ) 
(D.3) 

( 2 . 10 ) 
( 2 . 2 ) 
(C.5) 
(D.3) 

(3.7) 
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Xt 

Xn,t 

Xs,r,t 

Xn 

X 

sx 

regressor, partial sum of {r;t} . 

standardised regressor . 

component of Xt . 

standardised regressor process . 

finite-dimensional limit of Xn . 

subset of M . 

.. ( 2 . 2 ) 

.. (3.5) 

.. (C.4) 

Rem. 3.5 

Rem. 3.5 

Sec. 2.3 

iktf 

martingale difference components of Xinkf . 

.. (D.l) 

Vt 

dependent variable in regression . 

.. ( 2 . 1 ) 

Symbols not connected to Greek or Roman letters 


Ordered alphabetically by their description. 


—d 

both sides have the same distribution 


A 

converges in probability to 


'^fdd 

finite-dimensional convergence . 

Sec. 1 

L-J 

floor function (integer part) . 

Sec. 1 

ll/ll[/3] 

Fourier norm . 

.. (D.4) 


l.h.s. bounded in probability by the r.h.s. 

i^n bji if Qn — Opipn)) 

Sec. 1 

<c 

l.h.s. less than C times the r.h.s. 

(a 6 if a < Cb) 

Sec. 1 

< 

l.h.s of no greater order than the r.h.s. 

^ bfi if 0-72 — 0 ( 672 )) 

Sec. 1 

ll/llLip 

Lipschitz norm . 

.. App. C.2 

WfWp 

LP norm, for function / . 

denotes sup 2 ,£r|/(x)| when p = 00 

App. A 

lAIIP 

LP norm, (E for random variable X . 

denotes essential supremum when p = 00 

App. A 

r\j 

strong asymptotic equivalence . 

(072 bfi if lim72_>.oo ^nlbn — 1 ) 

Sec. 1 

ll^ll 

supremum of norm • over supyg^H/H . 

App. D 

[«A)]xe.r 

vector (a(xi),..., a{xm)y, for {xi,... Xm} = ^ . 

.. Sec. 2.3 


weak asymptotic equivalence . 

[an X bn if lim^^oo On/A € (- 00 , CX))\{0}) 

Sec. 1 

^—> 

weak convergence (van der Vaart and Wellner, 1996) ... 

Sec. 1 
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